12,600 research outputs found

    Use of pre-transformation to cope with outlying values in important candidate genes

    Get PDF
    Outlying values in predictors often strongly affect the results of statistical analyses in high-dimensional settings. Although they frequently occur with most high-throughput techniques, the problem is often ignored in the literature. We suggest to use a very simple transformation, proposed before in a different context by Royston and Sauerbrei, as an intermediary step between array normalization and high-level statistical analysis. This straightforward univariate transformation identifies extreme values and reduces the influence of outlying values considerably in all further steps of statistical analysis without eliminating the incriminated observation or feature. The use of the transformation and its effects are demonstrated for diverse univariate and multivariate statistical analyses using nine publicly available microarray data sets

    iPACOSE: an iterative algorithm for the estimation of gene regulation networks

    Get PDF
    In the context of Gaussian Graphical Models (GGMs) with high- dimensional small sample data, we present a simple procedure to esti- mate partial correlations under the constraint that some of them are strictly zero. This method can also be extended to covariance selection. If the goal is to estimate a GGM, our new procedure can be applied to re-estimate the partial correlations after a first graph has been esti- mated in the hope to improve the estimation of non-zero coefficients. In a simulation study, we compare our new covariance selection procedure to existing methods and show that the re-estimated partial correlation coefficients may be closer to the real values in important cases

    Significance of floods in metal dynamics and export in a small agricultural catchment

    Get PDF
    High-resolution monitoring of water discharge and water sampling were performed between early October 2006 and late September 2007 in the Montoussé River, a permanent stream draining an experimental agricultural catchment in Gascogne region (SW France). Dissolved and particulate concentrations of major elements and trace metals (i.e. Al, Fe, Mn, As, Cd, Cr, Cu, Ni, Pb, Sc and Zn) were examined. Our results showed that contamination levels were deficient to moderate, as a result of sustainable agricultural practices. Regarding dynamics, metal partitioning between particulate and dissolved phases was altered during flood conditions: the particulate phase was diluted by coarser and less contaminated particles from river bottom and banks, whereas the liquid phase was rapidly enriched owing to desorption mechanisms. Soluble/reactive elements were washed-off from soils at the beginning of the rain episode. The contribution of the flood event of May 2007 (by far the most significant episode over the study period) to the annual metal export was considerable for particulate forms (72–82%) and moderate for dissolved elements (0–20%). The hydrological functioning of the Montoussé stream poses dual threat on ecosystems, the consequences of which differ from both temporal and spatial scales: (i) desorption processes at the beginning of floods induce locally a rapid enrichment (up to 3.4-fold the pre-flood signatures on average for the event of May 2007) of waters in bioavailable metals, and (ii) labile metals – enriched by anthropogenic sources – associated to particles (mainly via carbonates and Fe/Mn oxides), were predominantly transferred during floods into downstream-connected rivers

    Iterative reconstruction of high-dimensional Gaussian Graphical Models based on a new method to estimate partial correlations under constraints.

    Get PDF
    In the context of Gaussian Graphical Models (GGMs) with high-dimensional small sample data, we present a simple procedure, called PACOSE - standing for PArtial COrrelation SElection - to estimate partial correlations under the constraint that some of them are strictly zero. This method can also be extended to covariance selection. If the goal is to estimate a GGM, our new procedure can be applied to re-estimate the partial correlations after a first graph has been estimated in the hope to improve the estimation of non-zero coefficients. This iterated version of PACOSE is called iPACOSE. In a simulation study, we compare PACOSE to existing methods and show that the re-estimated partial correlation coefficients may be closer to the real values in important cases. Plus, we show on simulated and real data that iPACOSE shows very interesting properties with regards to sensitivity, positive predictive value and stability

    Deuterated molecules in DM Tau: DCO+, but no HDO

    Full text link
    We report the detection of the J=2-1 line of DCO+ in the proto-planetary disk of DM Tau and re-analyze the spectrum covering the 465 GHz transition of HDO in this source, recently published by Ceccarelli et al. (2005). A modelling of the DCO+ line profile with the source parameters derived from high resolution HCO+ observations yields a DCO+/HCO+ abundance ratio of about 0.004, an order of magnitude smaller than that derived in the low mass cores. The re-analysis of the 465 GHz spectrum, using the proper continuum flux (0.5 Jy) and source systemic velocity (6.05 km/s), makes it clear that the absorption features attributed to HDO and C6H are almost certainly unrelated to these species. We show that the line-to-continuum ratio of an absorption line in front of a Keplerian disk can hardly exceed the ratio of the turbulent velocity to the projected rotation velocity at the disk edge, unless the line is optically very thick (tau > 10 000). This ratio is typically 0.1-0.3 in proto-planetary disks and is about 0.15 in DM Tau, much smaller than that for the alleged absorption features. We also show that the detection of H2D+ in DM Tau, previously reported by these authors, is only a 2-sigma detection when the proper velocity is adopted. So far, DCO+ is thus the only deuterated molecule clearly detected in proto-planetary disks

    SHrinkage Covariance Estimation Incorporating Prior Biological Knowledge with Applications to High-Dimensional Data

    Get PDF
    In ``-omic data'' analysis, information on the structure of covariates are broadly available either from public databases describing gene regulation processes and functional groups such as the Kyoto encyclopedia of genes and genomes (KEGG), or from statistical analyses -- for example in form of partial correlation estimators. The analysis of transcriptomic data might benefit from the incorporation of such prior knowledge. In this paper we focus on the integration of structured information into statistical analyses in which at least one major step involves the estimation of a (high-dimensional) covariance matrix. More precisely, we revisit the recently proposed ``SHrinkage Incorporating Prior'' (SHIP) covariance estimation method which takes into account the group structure of the covariates, and suggest to integrate the SHIP covariance estimator into various multivariate methods such as linear discriminant analysis (LDA), global analysis of covariance (GlobalANCOVA), and regularized generalized canonical correlation analysis (RGCCA). We demonstrate the use of the resulting new methods based on simulations and discuss the benefit of the integration of prior information through the SHIP estimator. Reproducible R codes are available at http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/shipproject/index.html

    SHrinkage Covariance Estimation Incorporating Prior Biological Knowledge with Applications to High-Dimensional Data

    Get PDF
    In ``-omic data'' analysis, information on the structure of covariates are broadly available either from public databases describing gene regulation processes and functional groups such as the Kyoto encyclopedia of genes and genomes (KEGG), or from statistical analyses -- for example in form of partial correlation estimators. The analysis of transcriptomic data might benefit from the incorporation of such prior knowledge. In this paper we focus on the integration of structured information into statistical analyses in which at least one major step involves the estimation of a (high-dimensional) covariance matrix. More precisely, we revisit the recently proposed ``SHrinkage Incorporating Prior'' (SHIP) covariance estimation method which takes into account the group structure of the covariates, and suggest to integrate the SHIP covariance estimator into various multivariate methods such as linear discriminant analysis (LDA), global analysis of covariance (GlobalANCOVA), and regularized generalized canonical correlation analysis (RGCCA). We demonstrate the use of the resulting new methods based on simulations and discuss the benefit of the integration of prior information through the SHIP estimator. Reproducible R codes are available at http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/shipproject/index.html

    Body dissatisfaction and body change strategies among adolescents : a longitudinal investigation

    Full text link
    This thesis examined body dissatisfaction and body change behaviors among adolescent girls and boys from a biopsychosocial framework. The contribution of biological, psychological and sociocultural factors were examined in relation to body dissatisfaction, weight loss, weight gain and increased muscle tone behaviors among early adolescent girls and boys. In particular, pubertal maturation, body mass index (BMI), perception of body shape and size and psychological factors, such as depression, anxiety, ineffectiveness, self-esteem and perfectionism, were examined as possible factors that may precipitate or maintain body dissatisfaction and engagement in body change strategies. The sociocultural factors evaluated were the quality of family and peer relationships, as well as the influence of family and peers in predicting the adoption of specific body change strategies. The specific mechanisms by which these influences were transmitted were also examined. These included perceived discussion, encouragement and modelling of various body change strategies, as well as perceived teasing about body shape and size. A number of separate cross-sectional and longitudinal studies were conducted to examine the above relationships and identify the factors that contribute to weight loss, weight gain and increased muscle tone behaviors in adolescents. Study 1 examined the psychometric properties and principal components structure of the Bulimia Test Revised (BULIT-R; Thelen, Farmer, Wonderlich, & Smith, 1991) to assess its applicability to adolescent samples. Study 2 investigated the nature of body dissatisfaction and weight loss behaviors among 603 adolescents (306 girls and 297 boys) using a standardised questionnaire. This preliminary study was conducted to ascertain whether variables previously found to be relevant to adolescent girls, could also be related to the development of body dissatisfaction and weight loss behaviors among adolescent boys. Studies 3 and 4 described the development and validation of a body modification scale that measured weight loss, weight gain and increased muscle tone behaviors. Studies 5 and 6 were designed to modify an Excessive Exercise Scale developed by Long, Smith, Midgley, and Cassidy (1993) into a shorter form, and validate this scale with an adolescent sample. Study 7 investigated the factors that contribute to weight loss, weight gain and increased muscle among adolescent girls and boys both cross-sectionally and longitudinally (over one year). Structural equation modelling was used to examine associations among self-reported body dissatisfaction, body change strategies and a range of biological, psychological and sociocultural variables both cross-sectionally and longitudinally. Overall, the results suggested that both girls and boys experience body dissatisfaction and engage in a number of different body change strategies in order to achieve an ideal size. A number of gender similarities and differences were identified in the expression of body dissatisfaction and the adoption of body change strategies for both girls and boys. Girls were more likely than boys to report body dissatisfaction and engage in weight loss behaviors, while boys were more likely than girls to engage in weight gain and increased muscle tone behaviors. Generally, the same factors were found to contribute to weight loss, and more specifically, bulimic symptomatology, ad weight gain in both adolescent girls and boys. While a combination of biological, psychological and sociocultural factors contributed to bulimic symptomatology, only biological and psychological factors were found to contribute to weight gain in adolescents. The most notable gender differences were found in the model of increased muscle tone. Sociocultural and biological factors contributed to increased muscle tone behaviors in girls, while sociocultural and psychological factors were implicated in these behaviors in adolescent boys. With the exception of the model of increased muscle tone for boys, body dissatisfaction was a consistent factor in the adoption of body change behaviors. Consistent with previous investigations, the present thesis provides empirical support for the need to examine the etiology and maintenance of such concerns and behaviors from a multifaceted perspective

    Over-optimism in bioinformatics: an illustration

    Get PDF
    In statistical bioinformatics research, different optimization mechanisms potentially lead to "over-optimism" in published papers. The present empirical study illustrates these mechanisms through a concrete example from an active research field. The investigated sources of over-optimism include the optimization of the data sets, of the settings, of the competing methods and, most importantly, of the method’s characteristics. We consider a "promising" new classification algorithm that turns out to yield disappointing results in terms of error rate, namely linear discriminant analysis incorporating prior knowledge on gene functional groups through an appropriate shrinkage of the within-group covariance matrix. We quantitatively demonstrate that this disappointing method can artificially seem superior to existing approaches if we "fish for significance”. We conclude that, if the improvement of a quantitative criterion such as the error rate is the main contribution of a paper, the superiority of new algorithms should be validated using "fresh" validation data sets
    corecore